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Abstract 

In this paper we characterize a mathematical model called Maximum 
Common Subelement (MCS) Model and prove the existence of four differ¬ 
ent metrics on such model. We generalize metrics on graphs previously 
proposed in the literature and identify new ones by showing three different 
examples of MCS Models on graphs based on (1) subgraphs, (2) induced 
subgraphs and (3) an extended notion of subgraphs. This latter example 
can be used to model graphs with complex labels (e.g., graphs whose labels 
are other graphs), and hence to derive metrics on them. Furthermore, we 
also use (3) to show that graph edit distance, when a metric, is related to 
a maximum common subelement in a corresponding MCS Model. 


1 Introduction 

Graphs are a natural model for a number of concepts in many different domains 
such as molecules in chemistry, interaction networks in social studies and bio¬ 
chemistry, workflow descriptions in scientific computing, just to name a few. 
In each of these domains, when dealing with collections of such objects, it is 
usually important to have a precise notion of similarity/dissimilarity between 
them. An adequate and precise way to define similarity/dissimilarity between 
graphs is by means of a metric on (the set of) graphs. 

Bunke and Shearer - 2 j showed that the function 

d B {gii92 ) = 1-7^-y 

maxjui, V2} 

is a metric on the set of graphs when gi and <?2 are graphs with, respectively, v\ 
and V‘2 vertices, and v\n is the maximum number of vertices of a common induced 
subgraph of g i and g2■ Later, Wallis et al. [ 7 ] showed that, by rearranging the 
same terms, the function 


dw(gi,g2 ) = 1 - 


Vl2i 

Vi + v 2 - V12i 
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is also a metric on the set of graphs. We say that these two metrics are based 
on induced subgraphs because the term iq 2 j is related to a common induced 
subgraph of input graphs g\ and <? 2 . 

An initial motivation for this work was to identify metrics on the set of graphs 
based on subgraphs instead of induced subgraphs that would be analogous dp 
and dw- Note in Figure |T] that, for the same two graphs, the largest number of 
vertices of a common subgraph and of a common induced subgraph can be sig- 
nificatively different. This observation leads to the fact that, depending on the 
application, the graph similarity/dissimilarity notion is better modeled either 
by a function based on subgraphs or by one based on induced subgraphs. One 
application where a function based on subgraphs is a better fit is reported by [5]. 
In their paper, they argue that a common subgraph (not necessarily an induced 
one) that has the largest number of edges is a better model for the similarity 
of chemical graphs since, in their words, “it is the bonded interactions between 
atoms in a molecule that are the most responsible for its perceived activity In 
this application for chemical graphs, analogous versions of dp and dw based on 
subgraphs would be more adequate. 

One metric on the set of graphs based on subgraphs was shown by Fernandez 
and Valiente [4]. Their function is equivalent to the following definition: 

dp (fid > 52 ) = (id + ei) + (v 2 + e 2 ) - 2 (ui 2s + e i2s ), 


where the new terms e\ and e 2 are the number of edges of g\ and g 2 , and iq 2s 
and ei 2s are the number of vertices and edges of a common subgraph of g\ and 
g 2 that maximizes the sum of number of vertices and number of edges among 
all subgraphs of <71 and g 2 . 

In this paper we characterize a mathematical structure called Maximum 
Common Subelement (MCS) Model (Section 3.2 1 , that generalizes the one de¬ 
scribed by [3], and show that four metrics are valid in such model (Theorem [l]), 
including general analogous versions of the functions dp, dw, and dp- We then 
show three examples of MCS Models on graphs. The first two examples are 
based on the usual notions of subgraphs (Section 4.2) and induced subgraphs 
(Section 


4.31, and the third example is based on a notion of extended subgraphs 


(Section 


4.4[ ). We refer to these three MCS Model on graphs as, respectively, 
I-MCS Model , and E-MCS Model. 


S-MCS Model , I-MCS Model , and E-MCS Model. The importance of these 
MCS Models on graphs is that they enable us to reproduce previous metrics on 
graphs (e.g., dp, dw , dp), extend them (weighting scheme), and derive new ones 
(e.g., analogous of dp and dw based on subgraphs, the metrics on the E-MCS 
Model). 

One interesting aspect of the E-MCS Model is that the (vertex and edge) 
labels of its graphs are elements of other MCS Models. This permits an E-MCS 
Model to describe rich structured objects (e.g., graphs whose labels are other 
graphs) and similarity models on them (i.e., the general MCS Model metrics are 
readly available for these rich structured objects). In Section [5| we use E-MCS 
Models to show that for any graph edit distance that is a metric on graphs, we 
can derive a corresponding MCS Model where the edit distance of two graphs 
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is related to the size of a maximum common subelement of the two graphs in 
this corresponding MCS Model. 

2 Preliminaries 

For the sake of completeness, in this section we state some standard concepts 
that are fundamental for the rest of the paper. 

Definition 1 . (^Metric, Metric Space,) A metric d on a set X is a function 
d:Xxl-> [ 0 , oo) that, for any x 1,2:2, £3 £ X, the following conditions hold 

(Ml) d(x-i , x 1) = 0 ; 

(M 2 ) d(xi,x 2 ) = d(x2,xi); 

(M 3 ) d(x 1, x 3 ) < d(x 1, x 2 ) + d(x 2, x 3 ); 

(Mf) if d(x i,x 2 ) = 0 then aq = x 2 ■ 

In this case, the pair ( X , d) is called a metric space. If X is finite then we also 
refer to ( X , d) as a finite metric space. 


Definition 2 . (^Partial Order,) Let =4 be a relation on a set X, i.e. =4 is a 
subset of X x X. We use the notation X\ =4 x 2 to mean (xi,x 2 ) is an element 
of = 4 . We say =4 is a partial order on X if the following conditions hold: 

(Rl) x =4 x (reftexivity) 

(R 2 ) aq =4 x 2 and x 2 =4 x 3 then aq =4 x 3 (transitivity) 

(R 3 ) aq =4 x 2 and x 2 =4 aq then aq = x 2 (antisymmetry) 

Furthermore, we use the notations |A|, V{A), and [. A\ k to mean, respectively, 
the number of elements in set A , the power set of A , and the set of all sets 
containin k > 1 elements of A. 

3 Maximum Common Subelement (MCS) Model 

In general, a natural model to the similarity of two objects is given by a num¬ 
ber reflecting how much do the two objects overlap. The Maximum Common 
Subelement (MCS) Model is a precise way of encoding this idea of similarity, 
framed in a general language that can fit many different scenarios (our focus 
application in the following sections are graphs). A MCS Model is composed of 
three parts. The first part is a set also called the domain of the model. The 
second part, used to to make the informal notion of overlap precise, is a partial 
order on the set or domain of the model. The third and last part, used to quan¬ 
tify how much is an overlap, is a size function which assigns a size value for 
each element of the domain. The following definitions fix some notation before 
we formally define a MCS Model. 
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Figure 1 : Difference between the maximum number of vertices of a common 
induced subgraph and of a common subgraph of graphs g\ and gi shown in (a). 
No common induced subgraph of g\ and <72 has more than 6 vertices (b), while 
there exist a common subgraph with 9 vertices (c). (these graphs represent 
scientific workflow descriptions generated using [B] ( 2008 )) 
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Definitions. ( Subelement, Superelement, Common Subelements ) Let 
=4 be a partial order on a set X. For X\,X2 £ X, if x\ =4 %2, we say that X\ is 
a subelement of x2 and that X2 is a superelement of x 1. We define the function 
common subelements, denoted by cs, as 

cs(X') = { x £ X : x =4 y,Vy £ X'}, for X' C X. 


Definition 4 . ('Size Function,) Let =4 be a partial order on X. We say a 
function s : X —► [ 0 ,00) is a size function on (X,=4) if, for X\,X2 £ X, the 
following conditions hold 

(51) if x 1 =$ X2 then s(x±) < s(x 2); 

(5 2 ) if x 1 ==( X2 and s(Xi) = s^) then x\ = X2- 

The size function conditions (SI) and (S 2 ) formalizes the idea that a subele¬ 
ment must have either a smaller size ( proper subelement), or have the same 
size and be the same element (a non-proper subelement). Now we are ready to 
define a MCS Model. 

Definition 5 . ('Maximum Common Subelement Model,) A Maximum Com¬ 
mon Subelement (MCS) Model on a set X is a triple 


(X, =4,s ), 

where =4 is a partial order on X, and s is a size function on ( X , =<;) such that 
(Al) Given Xi,X2 £ X, cs({xi, X2}) 0 and 

{s(x) | x £ cs({xi, X2})} has a maximum ; 

(A 2 ) Given X\,X2,x £ X and x±,X2 =4 x there exists X12 £ cs{{x 1,22}) 
such that s(x) > s(x 1) + s(x 2) — s(x 12). 


Condition (Al) on a MCS Model states that any two elements (not neces¬ 
sarily distinct) have at least one common subelement, and, among all common 
subelements, there is at least one (could be more than one) whose size is the 
largest possible. Condition (A 2 ) is rooted on the idea that a superelement of 
any two elements must, some how, contain these two elements simultaneously, 
in other words, it contain a kind of union of these two elements. Imagine two 
finite sets S\ and S2, intuitively we expect that the number of elements of any 
superset S of sets Si and S2 to have at least as much elements as their union: 
\S\ > | Si U S2I = |Si| + IS2I — I Si n S2I, but never fewer elements than that. 

The MCS Model is a generalization of the model proposed by [ 3 ], referred 
here as the DR Model. The motivation to define the DR Model in their paper 
was the same we had to define the MCS Model here: a template to fit applied 
situations into, and derive metrics. A terminology difference between the DR 
Model and the MCS Model is that the terms pattern, generalization, special¬ 
ization in the former becomes, respectivelly, element, subelement, superelement 
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in the latter. A more important difference is that, in our terminology, while 
the DR Model requires every two elements to have at least one subelement and 
one superelement, the MCS Model only requires the subelement to exist. Once 
the terminology between the two models is aligned, it is straightfoward to prove 
that MCS Model is in fact a generalization of the DR Model (e.g., diamond in¬ 
equality there is equivalent to condition (A 2 )). Thus, all the examples given in 
that paper, namely weighted sets, strings and trees (with appropriate partial 
order relations and size functions) are also examples of MCS Models. [ 3 ] proved 
one metric function to be valid in any DR Model. In this paper (Theorem [I]) 
we extend this list to four metric functions to be valid in an even more general 
model: the MCS Model. 

When presenting examples of MCS Models in the following sections, instead 
of showing that property (Al) is valid, we show that the following more restric¬ 
tive property (Al’) is valid: 

(Al’) 0 < |cs({xi,£2})| < 00; 

Clearly, (Al’) implies (Al), since the number of subelements of any two elements 
is finite. 

Before going into some properties and the metrics of MCS Models we set 
more terminology 

Definition 6. ("Auxiliar Functions,) If (X,=4,s) is a MCS Model and X' 
is a subset of X then the maximum common subelements size function, denoted 
by s', is defined by 


s'(X') = max{ s(x) : x £ cs(X') }, 

and the maximum common subelements function, denoted by mcs, is defined by 

mcs(X') = { x : s(x) = s\X'),x £ cs(X')}. 

Note that, in general, s' and mcs might not be well defined (e.g., common 
subelements of three elements might be empty). By (Al), these functions are 
well defined when when \X’\ < 2 . In the rest of the paper we should use these 
functions only when they are well defined. 


3.1 Some Properties of MCS Models 

Proposition 1 (Uniqueness of minimum size element). Let (X,=^,s) be a 
MCS Model and let Xq £ X be such that s(x 0) = min({s(x)\x £ X}). Then, the 
following statements are true: 

(a) if x £ X is such that s(x) = s(a:o), then x = xq . 

(b) The element Xq is a global subelement, i.e., Xq £ cs(X). 
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Proof, (a) Let y £ cs({xo 5 a;}). It exists by (Al). By (SI), we conclude that 
s(y) = s(x o) = s(x) and using (S 2 ), we conclude that x = y = xq- (b) Let 
y £ X and, again, let z £ cs({xo, y}). By (SI), we conclude that s(z) = s(x o), 
and using (S 2 ), we have that Xq = z => Xq =4 y- □ 

The following lemma provides a way to derive a MCS Model from a finite 
metric space. The interesting relation between this metric space and its derived 
MCS Model is that the metric is somehow preserved in the structure of the 
MCS Model. Figure [ 2 ] presents a finite metric space and a visual illustration 
of its derived MCS Model. We use this lemma in Section [ 5 ] to built a relation 
between Graph Edit Distance and MCS Models. 

Lemma 1 (Metric Space to MCS Model). Let E be a finite set and the function 
d:ExE-> [ 0 , oo) be a metric on E. In this case, there is a MCS Model 

Mx = {X,=$x,sx) 

where ECI and, for cri, cr 2 £ E, 

d(<Ji,a- 2 ) = sa-(cti) + s x (a 2) - 2s' x ({<t 1 ,(T2})- ( 1 ) 

Proof Let n = |E| and K n = (E, [E] 2 ) be a complete (unlabeled simple) graph. 
Assume the natural interpretation: in K n . an edge {a\, cr 2 j £ [E ] 2 has endpoints 
cri,cr2 £ E. Furthermore, let Z be all non empty subsets of edges in K n that 
induces a connected subgraph of K n . We are now able to define the elements 
of our Mix- the set X , the order relation =^ x , and the size function s x . First, 
the elements of X are the vertices of K n plus every subset of edges in K n that 
induces a connected subgraph of K n : 

X = E U Z. 


For X\, x 2 £ X, let x,\ x 2 if 
(Ol) x\ = x 2 , 

( 02 ) Xl = EC [E] 2 ,x 2 = o- £ E, 

and vertex a is an endpoint of some edge in E. 
( 03 ) xi,x 2 C [E ] 2 and x\ D x 2 . 

Define R to be 

R = 0 + - ^2 d ( a i’ a 2 ), 

{( 71 ,( 72 } 

eP] ! 

for some 9 > 0 . For x € X define 

! R, if x £ E, 

R-\ "22 ^b^), for x C [E] 2 ., 
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Figure 2 : Metric d on the set E = {A, Z 3 , ( 7 , £>} and MCS Model Mx = {X, =4x 
,sx) on a set X where SCI. Each element of X is represented by a circle and 
A,B,C,D are the top four circles. Two elements X\,X2 in Mx are related by 
X\ =^x x 2 if there is an upward path from Xi to ir 2 . The size function sx grows 
bottom-up and its values are shown below each corresponding element. Model 
Mx is related to d by the fact that d(ai, cr 2 ) = sx(ffi) + s 0 cx 2 ) — 2s' x ({cri, ct 2 }) 
for any cri,cr 2 in E. By Lemma [l] for any other metric (on a finite set) there is 
a MCS Model satisfying the same properties as in this example. 
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Note that, with this definition of Sx, 0 is the size of [E ] 2 (which is an element in 
Z and is the smallest element in Mx)- We now prove that Mx = (X,=4 x , s.y) 
is a MCS Model: 

• (=^x IS A PARTIAL ORDER) 

- (Reflexive) By (01), =4 X is reflexive. 

- (Transitive) Assume 

(HI) a;i ^ x ^2 and (H2) x 2 =4x x 3 . 

If X\,X 2 ,X 3 € Z , then, by (03), 13 3 12 5 £ 1 , therefore, £3 3 x\ 
and, again by (03), £1 £3. Note that, if x G E, then it is maximal 

on^x- Therefore, for 1 < i < j < 3, if £; € E, then x :j = £j. If £1 G E 
then £1 = £2 = £3 and, by (01), £1 ^ £3. If £1 £ Z and £2 G E, 
then £ 2 = £3 and (HI) is equivalent to X\ =^ x £3- If £i,£2 G Z and 
£3 G E, then, by (HI), X\ 3 £ 2 and, by (H2), £1 is an endpoint of 
some edge e in £ 2 . As edge e is also en edge in £1 we can conclude 
£1 =^x £3- 

- (Antisymmetric) Assume 

(H3) £1 =^x £2 and (H4)£ 2 =^ x £i- 

If £1 € S, then, by (H3), we must have X\ = £ 2 . If £i ,£2 G -Z) then 
(H3) and (H4) means £1 3 £1 and £2 3 £ 2 therefore £1 = £ 2 . 

• (sx IS A size function) First, by the definitions of R and Sx it is easy 
to check that Sx(x) > 0 for all x G X. If x\ =4 X £2 then three cases can 
occur: 


(Cl) £1,£2 G Z , (C 2 ) £1 £ Z,x 2 G E, (C 3 ) £1, £2 C E. 

— (SI) We need to show that: 

if £1 =^ x £2 then s(£i) < s(£2). 

If (Cl) occurs then X\ 'D Xi and, by definition, the expression for 
s.y(£i) will subtract from R at least the same edge terms d(or, <T2)/2 
as the expression for s_y(£i), therefore, s.y(£i) < s. y(£ 2 )- If case 
( 2 ) occurs, £1 has at least one edge e and the expression for sx(£i) 
subtracts at least one positive value from R (e.g.,the value relative 
to e). Since £2 = R, then s(£i) < s(x 2). If case ( 3 ) occurs, then 
£1 = £2, therefore s(£i) < s(x 2). 

— (S 2 ) We need to show that: 

if £1 £2 and s(£i) = s(£2) then X\ = £2. 

Assume X\ =^ x £2 and s(£i) = s(£2). If (Cl) occurs, then X\ 3 £ 2 
and, by definition, the expression for sx(£i) subtracts from R at 
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least the same edges as in Sx(x 2 ). If we assume X\ ^ x 2 then sx(x\) 
would subtract at least one more positive term from R and there¬ 
fore SA'(ah) < sx(x 2 ), which contradicts the assumption sx(ii) = 
s x (x 2 ). Case (C 2 ) cannot occur, because X\ =^ x x 2 , X\ £ Z , and 
i 2 eS would imply sa(£i) < Sx{x 2 ) since s(x 2 ) = R and s(x 1) def¬ 
inition subtracts at least one positive term from R. If (C 3 ) occurs, 
then necessarily x\ = x 2 . 

• (Al) We are going that (Al) holds, by showing that (Al’) also holds. In 
order to do so, let xi,x 2 £ X. By definition, the element x = [E ] 2 £ X is 
a subelement of all other objects in X, therefore | cs ({xi, £2})! > 1 , for all 
X\,x 2 £ X. As A is hnite, then \cs({xi,x 2 })\ must be finite. 

• (A 2 ) We have to show that 

Given x±,x 2 ,x € X and £1, x 2 =4 x 

there exists x ± 2 £ cs({ xi,x 2 }) 

such that s(x) > s(x 1) + s(x 2 ) — s(x 12). 

Before going into this axiom, we first note that if x\, x 2 £ Z, x £ X and 
Xi, x 2 = 4 x x, then x\ Ux 2 is also an element of Z. To see this fact, let a £ E 
be equal to x, x £ E, or an endpoint of one edge in x if x £ Z. Suppose 
cr a is a vertex in X\ and cp, is a vertex in x 2 (both are in x\ U x 2 ). There 
must be a path from <j a to a in X\ (which is contained in x\ U x 2 ) and 
there must be a path from cr;, to cr in x 2 (which is contained in x\ U x 2 ). 
By joining these paths we have a path from o a to cp, in xiUx 2 . Therefore, 
Xi U x 2 £ Z. 

We will split this into three cases: ( 1 ) x\ £ E; ( 2 ) x £ Z\ ( 3 ) x £ E and 
Xi,x 2 £ Z. By the symmetric roles that X\ and x 2 take in this axiom, 
these cases are enough to cover all possibilities. 

— Case ( 1 ): As x± £ E, then we must have x = x\ and making Xi 2 = x 2 
we have the axiom, since 

sx(x) > s x (x) = s x (x 1) 

= s x (xi) + Sx(x 2) + Sx(x 2) 

= Sx(a:i) + Sx(x 2 ) + s x ( x 12 ). 

— Case (2): As x £ Z, then x\ D x, x 2 D x. Note that x\ n x 2 D x 
and x has at least one edge since it is an element in Z. Making 
Xi 2 = x\ U x 2 we have the axiom since: 

s x (x)s x (x 1 r\x 2 ) 

= s x {x 1) + S X (x 2) + Sx{ X\ 2 ). 

Note that X\ Hx 2 might not be a member of Z , but, since the formula 
for sx is well defined any subset of [E] 2 , we used it. 
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— Case ( 3 ): Note that the inequation 

s x {x) > s x (x i n x 2 ) 

is also true in this case. If X\ D x 2 is the empty set we have that the 
only element in common between the graphs induced by X\ and x 2 
is the single vertex x. We can use the same development as in the 
previous case to estbilish the axiom in this case. 

It now remains to show that Equation [l] is valid. The case where ay = cr 2 
is trivially true. Suppose oy ^ a 2 . By the definition of =<: x we know that 
{{<Ti, cr 2 }} ci and {{oy, <r 2 }} =4x c 2 . Furthermore, by the clehnition of s x , 

sjc({{ci, cr 2 }}) = R — ^d(ci, cr 2 ) 

= 2 s.y({{ci,ct 2 }}) = 2 (R) - d(oy,cr 2 ) 

= d(cri,a 2 ) = 2 (R - 2 sx({{ci, a 2 }}) 

= d(cri,a 2 ) = s x (xi) +x x {x 2 ) - 2sy({{ci, c 2 }}) 

which is in the form of Equation [l] If {{oy, <r 2 }} £ mcs({ai, cr 2 }) then we 
have the result. Let’s show that this is indeed true. Let x be a memeber of 
mcs({(7i, c 2 }). It then satisfies: x^ x ci, u 2 . It also must be in Z since o\ ^ a 2 . 
There must be a path ay, / 3 i, / 3 2 ,..., / 3 k, c 2 in a: otherwise x = 4 x ci and x =^ x c 2 
would not be true. Actually x must induce a path from ay to a 2 otherwise we 
could remove the extra (non-path) edges and still get a set of edges inducing a 
connected subgraph and with a larger Sx- Furthermore, for paths of the form 
ci,/ 3 i,/ 3 2 ,---,/ 3 fc,c 2 , replacing edges {ci,A} and {/ 3 i,/ 3 2 } by {c 1; / 3 2 } we still 
have a path, and, by the fact that d is a metric, we have not increased the size 
sx of our x. This way we can erase all intermediate graphs and get that the 
graph that induces the path a±,a 2 must be a member of mcs({ai, a 2 }). With 
this, the result is estabilished. □ 

Lemma [ 2 ] is a technical property used in Section | 3 . 2 | in the proof of Theo¬ 
rem [TJ 

Lemma 2. Let (X, = 4 ,s) be a MCS Model. The inequality 

s'({xi,a: 2 }) + s'({a: 2 ,a:3}) < s{x 2 ) + s'({a;i,x 3 }). (2) 

holds for all Xi,x 2 ,X3 £ X. 

Proof. Let X \ 2 £ mcs({xi,x 2 }) and £ 2 3 £ mcs({a: 2 ,£3}). As Xi 2 ,x 2 3 =4 x 2 , we 
can use axiom (A 2 ) to conclude that there exists a;i 2 3 £ cs({aq 2 , x 23 }) such that 
s(® 2 ) > s(ah 2 ) + s(x 2 3) — s(aq 2 3). We then can write 

s'({a;i, x 2 }) + s'({x 2 , x 3 }) = s(xi 2 ) + s(x 23 ) 

< s(x 2 ) + s(xi 23 ) < s(x 2 ) + s'({xi, X 3 }). 

□ 
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3.2 Metrics on MCS Models 


The main result in this section is the Theorem Q] which states that four different 
functions are indeed metrics on MCS Models. 

Theorem 1 (Metrics on MCS Models). Let M = (X,=4,s) be a MCS 
Model on X and let d a ,db,d c , dd be 

d a {x 1 ,x 2 ) = s{:xi) + s{x 2 ) - 2s\{xi,x 2 }), ( 3 ) 

d b (xi,x 2 ) = max{s(ii), s(:r 2 )} - s'({xi,x 2 }), ( 4 ) 


d c {x\,x 2 ) 


0, if s(x 1 ) = s(x 2 ) = 0 


s'({z 1,^2}) 

max{s(cci), s(a; 2 )} 


otherwise. 


( 5 ) 


I 0, if s{x 1) = s{x 2 ) = 0 

Mxi,x 2 )= l s , ( , (6) 

1--—r- - —;-—ry-r- , otherwise. 

\ -s(xi) T s[x 2 ) x 2 }) 

Then, all of them are metrics on X. 

Proof. Since s'({a;i,Xi}) = s(a;i), it is easy to check that (Ml) is true for all 
formulas. Furthermore, as s'({a;i, x 2 }) = s'({a; 2 , £1}), it is also easy to see that 
(M 2 ) is true for all formulas. Note that for any Maximum Common Subelement 
Model we have 


x\ ^ x 2 => s'({xi,x 2 }) < max({s(xi),s(a; 2 )}) ( 7 ) 

< s(ah) + 5(^2) - s'({a;i,x 2 }). 

It is now easy to see that (M 4 ) is true in all formulas (except for the case 
s(a:i) = s(x 2 ) = 0 on d c and dd) by using its contrapositive form 

if Xi 7^ x 2 then d{xi,x 2 ) 7^ 0. 

Assume X\ 7^ x 2 and check that, using Equation [T] above, each of the four 
distance formulas will result in a positive number. The case when s(xi) = 
s(x 2 ) = 0 on the formulas d c and dd is also true because, by Proposition [l] 
there can be only one element with size zero in a MCS Model. The proof of 
(M 3 ) will be given separately for each formula. For all the following proofs let 
xi,x 2 ,x 3 £ X. 
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• (M 3 ) IS VALID FOR d a - Using the Lemma [ 2 j we can write 

0 < s(x 2 ) + s'({xi,x 3 }) - s'({xi,x 2 }) - s\{x 2 ,x 3 }) 

=> 0 < 2s(x 2 ) + 2s , ({xi, x 3 }) - 2 s'({xi,x 2 }) - 2s'({x 2 ,x 3 }) 

=> s(xi) + s(x 3 ) - 2 s'(x 1 ,x 3 ) < s(x i) + s(x 2 ) 

- s'(x 1 ,x 2 ) + s(x 2 ) + s(x 3 ) - 2s'(x 2 , x 3 ) 

=> d 0 (zi,x 3 ) < d a (x 1 ,x 2 ) + d a {x 2 ,x 3 ). 

which proves (M 3 ) for d a . 

• (M 3 ) IS VALID FOR db : We split this proof in three cases. These are the 
only cases need to be considered, since the role played by x 3 and x 3 in 
(M 3 ) are symmetric. 

— (Case 1 ) If s(x 2 ) < s(x 1) < s(x 3 ) We can write: 

Lemma [ 2 ]=> 0 < s(x 1) + s'({xi, x 3 }) — s'({xi, x 2 }) — s'({x 2 , £3}) 
=> s(x 3 ) - s'({xi,x 3 }) < s(xi) 

- s'({xi,x 2 }) + s(x 3 ) - s'({x 2 ,x 3 }) 
max({s(a;i),s(a: 3 )}) - s'({a,’i,a: 3 }) < 

max({s(a,’i),s(x 2 )}) - s'({xi,x 2 }) 

+ max({s(x 2 ),s(x 3 )}) - s'({x 2 , x 3 }) 

=> d b (xi, x 3 ) < db(xi,x 2 ) + d b (x 2 ,x 3 ). 

— (Case 2 ) If s(x 3 ) < s(x 2 ) < s(x 3 ), adding s(x 3 ) to both sides of Q 
we have 

s(x 3 ) - s'({xi,x 3 }) < s(x 2 ) 

- s'({xi,x 2 }) + s(x 3 ) - s'({x 2 , x 3 }) 

<=> max({s(xi), s(x 3 )}) - s'({xi, x 3 }) < 

max({s(xi),s(x 2 )}) - s'({xi,x 2 }) 

+ max({s(x 2 ),s(x 3 )}) - s'({x 2 ,x 3 }) 

=> d b (x i,x 3 ) < d b (x i,x 2 ) + d b (x 2 ,x 3 ). 

— (Case 3 ) If s(xi) < s(x 3 ) < s(x 2 ), adding s(x 3 ) to the left hand side 
and s(x2) to the right hand side of ([2]) we have 

s(x 3 ) - s'({x i,x 3 }) < s(x 2 ) 

- s'({xi, x 2 }) + s(x 2 ) - s'({x 2 , x 3 }) 

O max({s(xi), s(x 3 )}) - s'({xi, x 3 }) < 

max({s(xi),s(x 2 )}) - s'({xi,x 2 }) 

+ max({s(x 2 ),s(x 3 )}) - s , ({x 2 ,x 3 }) 

=> d b (x i,x 3 ) < d b (x i,x 2 ) + d b (x 2 ,x 3 ). 
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• (M 3 ) IS VALID FOR d c : The triangle inequality (M 3 ) requires expression 
d c (x i, X2) + d c (x 2, £3) — d c {x 1, £3) to be greater than or equal to zero. We 
split this into three cases: 

— (Case 1 ) If s(xi) = s(x 2 ) = 0 , then, by Property [lj x\ = x 2 and the 
triangle inequality becomes d c (xi,xi) + d c (xi,X3) — d c (x 1,0:3) > 0 
which, by definition of d c , can be reduced to d c (x 1,2:3) > d c (x 1,2:3) 
which is obviously true. An analogous argument can be made to 
show that (M 3 ) is valid in the cases where s(xi) = s(x 3) = 0 and 
s(x 2 ) = s(x 3 ) = 0. 

If (Case 1 ) doesn’t occur, then at least two elements in {aq, x 2 , *3} have 
size s greater than zero and the definition of d c we need to use is the 
bottom one in Equation ([ 5 |. In this case, the expression for (M 3 ) becomes 
([8]) > 0, where ([t]) is 

1 s'{x 1,3:3} s'{x 1 ,x 2 } s , {.t 2 ,x 3 } 

M13 M\2 M2 3 

and Mij is a short name for max{s(aq), s(xj)}. The remaining cases that 
are sufficient to prove that (M 3 ) is valid for d c are: 

— (Case 2 ) If not (Case 1 ) and s(x 2 ) > s(xi),s(x3) then 

s(x 2 ) x 

( s'{xi,x 3 } s'{xi,x 2 } s'{x 2 ,x 3 }\ 

= - w, - mr) 

> s(x 2 ) + s'{xi, x 3 } - s'{x 1, x 2 } - s'{x 2 , x 3 } 

> 0 , by Lemma [ 2 ] 

Since s(x 2 ) > 0 , this implies that ^ > 0 . 

— (Case 3 ) Similarly, if not (Case 1 ) and s(xi) > s(x 2 ),s(x 3) then 

s(xi) x 0 

_ , ^ , s'{xi,x 3 } s'{xi,x 2 } s'{x 2 ,x 3 } N \ 

Ml3 M\2 m 23 ) 

= s(xi) ^1 - ' ' 1 /'^ j + s '{ Xl ’ ^ 3 } - s'{xi, x 2 } 

> s(x 2 ) ^1 - S + S ^ Xl ’ ^ 3 } “ s'ixi, X 2 } 

s(x 2 )s'{X2,X 3 } , , 

= s(x 2 )- — -)- s |xi,x 3 } - s |xi,x 2 | 

M23 

> s(x 2 ) - s'{x 2 , x 3 } + s'{xi, x 3 } - s'{xi, x 2 } 

> 0 , by Lemma [ 2 ] 
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Since s(xi) > 0 , this implies that ^ > 0 . 

• (M 3 ) IS VALID FOR dd■ This proof follow the same lines as the one 
given for graphs in [ 7 j- The triangle inequality (M 3 ) requires expression 
dd(x i, X2) +dd(x 2, X3) — dd{x 1, X3) to be greater than or equal to zero. We 
split this into three cases: 

— (Case 1 ) If s(aq) = s(x 2 ) = 0 , then, by Property [I] x\ = X2 and the 
triangle inequality becomes dd(x 1,2:1) + d^(x 1,2:3) — dd(Xi,Xs) > 0 
which can be reduced to dd(x 1,0:3) > dd(x 1,0:3) which is obviously 
true. An analogous argument can be made to show that (M 3 ) is 
valid in the cases where s(*i) = 3(2:3) = 0 and s(x 2 ) = s(X3) = 0. 

If (Case 1 ) doesn’t occur, then at least two elements in {2:1,2:2,2:3} have 
size s greater than zero and the definition of dd we need to use is the 
bottom one in Equation ©• In this case, the expression for (M 3 ) becomes 
© > 0, where ©is 

s'{xi,X3} s'{x ll x 2 \ s' {x 2 ,x 3 } 

1 + — tv W 2 mr~ <9) 


and Uij = s(xi) + s(xj) — s' {xt, Xj}. Let x ij £ mcs({xi,Xj}). By (A2) of 
a Maximum Common Subelement Model there exists £123 ^ £12, £23 such 
that s(xi23) < s(.ti 3 ) = and 3(2:123) > s(x i2 ) + s(x 2 3) - s(x 2 ). 

In this way we can write 


© = 1 + 
> 1 + 


5(3:13) _ s(x i2 ) _ s(x 23 ) 

U 13 U12 U23 

_ 5(3:123) _ 

5(3:1) + s(x 3 ) - s(x 123) 


5(3:12) 

U\2 


s{x 23) 
U23 


( 10 ) 


Let non negative numbers oi, a 2l 03,012, a 2 3,0123 be defined by 3(2:123) = 

Ol23) s(xi2) = Oi2 + 0123; 5(^23) = 023 + a 123) s(xi) = Ol + Oi2 + 0123? 

5(3:2) = 02 + O12 + 023 + a i 23 ) 5(^3) = 03 + a 2 3 + 0123;. And let T = 
Oi + 02 + 03 + 012 + 023 + ai23- We now can write 


( 10 ) = 1 


0123 
T - a 2 


Ol2 + 0123 

T - a 3 


023 + 0123 

T — ai 


(ID 


To show that (11) > 0 it is sufficient to show that (11) times a positive 


number is greater than or equal to zero. Let (T — ai)(T — a 2 ){T — 03) be 
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this positive number, since (Case 1 ) is false. 

© x (T - ai )(T - a 2 ){T - a 3 ) 

= (T - a\)(T - a 2 )(T - a 3 ) + ai 23 (T - ai)(T - a 3 ) 

- («12 + ai 23 )(T ~ ai)(T - a 2 ) 

— (023 + ai 23 )(T — a 2 )(T — a 3 ) 

= a\a 2 (T — a 3 ) + X 1 (aia 3 + a 2 a 3 + 01012 + 02O12 

+ 020123 + a 2 a 23 + a 3 a 23 + a 2 ai 23 ) 

+ (oia 3 ai 23 + aia 2 ai 2 + aia 2 ai 23 + a 2 a 3 a 23 + a 2 a 3 ai 23 ) 

> 0 

The proof of Theorem |T| is complete. □ 


4 Graph MCS Models 

In this section we present three examples of MCS Models on graphs and use 
Theorem [l] to derive different metrics on graphs for each of these examples. In 
particular, we are able to reproduce and generalize previous metrics on graphs 
based on subgraphs and induced subgraphs, and obtain new metrics on graphs 
based on an extended subgraph notion. 

4.1 Graphs Terminology 

Here is a series of graph related definitions we use in the rest of the paper. We 
chose undirected simple graphs as our default case, but the results we present 
in the following sections also work for directed graphs. 

Definition 7 . ('Graph,) A graph is a 4 -tuple g = (V,E,£ v ,£ e ) where 

• V is a finite set of vertices; 

• EC [V] 2 is the set of edges; 

• £y : V —y a function that assigns labels to vertices; 

• £e '■ E Tie is a function that assigns labels to edges; 

IfV = % then g is called the empty graph. 

Definition 8. ('Subgraph,) A graph g' = (V 7 , E', £' v , £' E ) is said to be a sub¬ 
graph of g = (' V,E,£ v ,£ e ), if V' CV, E' C E n [V'] 2 , £' v (v) = £ v (v) for 
v £ V', and £' E [e) = £e{z) for e £ E'. 

Definition 9 . ('Induced Subgraph,) A graph g' = (V',E', ^v^e) is said 
to be an induced subgraph of g = (V,E,£v,£e) if g' is a subgraph of g and 
E' = En[V'] 2 . 
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Definition 10 . (^ISOMORPHISM ) Let gi = (Vi, E\, £yi, (-ei) and g2 = (V2, E-2, 
IV2,(-E2 ) be graphs. A bijection <f>: V\ —> V2 is an isomorphism between gi and 
g2 if the conditions four conditions are valid: ( 1 ) E2 = {{<f>(u), <fi{v)} : {u,u}G 
Ei}; ( 2 ) l V \{v) = tv2(<f>(v)), forv G V x ; ( 3 ) = ( E 2{{<t>{u),<j>{v)}), 

for {«,, w} G Ei. If there exists an isomorphism between two graphs we say they 
are isomorphic. 

Remark 1. We use the notion <f>(e), where e = {u, u} G E x and u, v G Vi, to 
mean the edge {<j>(u), (f>{v)} G -E^- 

Definition 11. fS ubgraph Isomorphic,) A graph g is subgraph isomorphic 
to a graph g', denoted by g' C g, if there exists a subgraph of g that is isomorphic 
to g'. 


Definition 12 . (^INDUCED Subgraph Isomorphic,) A graph g is induced sub¬ 
graph isomorphic to a graph g', denoted by g' g, if there exists an induced 
subgraph of g that is isomorphic to g'. 


Definition 13 . f Graph ^-Completion ) Let g = (V, E,(.v,(e) be a graph 
with vertex labels in £y and edge labels in Tie- For n > \V\, a special vertex 
label £y, and a special edge label £e, we define the graph n-completion of g as 

K7' £B {g) = {V,E',t' v , 4 ) 


where 


• v' = yu{ui,. 

• • ) n n _ |y| } ; 

<N 

II 

K) 

• 


. (!{v) = | 

O), 

if v G V, 

[ey 

ifveV'\V, 

• E{e) = | 


if e G E, 


if e G E'\E. 

When £y and £ E are 
of g as K n (g). 

clear in the context, 

4.2 Subgraph MCS Model 


The first example of MCS Model on graphs is based on the subgraph relation 
C (Definition [IT]) . 
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Definition 14 . {S-MCS Model} A subgraph MCS Model or S-MCS Model 

is a triple 

(G, c, SGVEa), 

where 

• G is the set of graphs (Definitionwith vertex labels in 'Ey, and edge 
labels in Ee ■ Furthermore, we consider two graphs g \, g 2 G G that are 
isomorphic to be the same graph: gi = g 2 . 

• C is the subgraph isomorphic relation on G (Definition \ 1 1 \ ): 

• scvEa ■ G — > [ 0 ,+00) is a function based on a label weighting function 
a : (£y UEe) -> ( 0 , +00) and, for g = ( V, E , ty, £e), is defined by 

f 0 , z/F = 0; 

s GVEa{g ) \ ^^a(£y(v))+ ^^a(£ E (e))> otherwise. 

luey e£E 

The following theorem shows a S-MCS Model is indeed a MCS Model. 

Theorem 2 . The S-MCS Model is a MCS Model. 

Proof. It can be verified that C is a partial order on G. Here we are only going 

to show (SI), (S 2 ), (Al) and (A 2 ). 

(51) Let gi = (V 1 ,E 1 ,£ vl ,e E1 ) and g 2 = {V 2 , E 2 , l V2 , Lei) be graphs in G. If 
g 1 C g 2 then there is an isomorphism </> between g 1 and g 2 , a subgraph of 
g 2 . It should be clear that s(g\) = s(g 2 ) since for every vertex and edge in 
g 1 there is a (^-corresponding, equally labeled, vertex and edge in g' 2 and 
vice-versa. As the vertices and edges of g 2 are subsets of V 2 and E 2 , then 
s{g 2 ) < s(g 2 ), since the vertex and edge sums in ScvEa would run over 
these subsets. From this we can conclude scvEaigi) < SGVEa{g2)- 

(5 2 ) Consider the same setup as in (SI) above: g\ C g 2 and <71 isomorphic to 
subgraph g 2 of g 2 . Add the extra hypothesis that Sgveo(s 1) = Sgveo( s 2)- 
This implies, as SGVEa{gi) = SGVEa(g- 2 ), that SGVEa{g 2 ) = SGVEa(g2) 
which implies that the vertices and edges of g 2 are exactily V 2 and E 2 . In 
other words, g 2 = g 2 and g\ is isomorphic to g 2 which in our case is the 
same as g\ = g 2 . 

(Al) We are going to show that (Al’) holds. In fact, the empty graph is a 
subgraph of any other graph. This implies that, for any pair <71,172 € G, 
we have {“empty graph”} C cs({gi,g 2 }) and, consequently, 0 < 1 < 
|cs({<7i, 32}) |- Also, by our definition, graphs have finite number of vertices 
and edges. This implies that all subgraphs of a graph is also finite (all 
possible subsets of the vertex and edge sets of a graph are finite). As 
we are consider isomorphic graphs to be equal, we know cs({g\, g 2 }) C 
cs ({ffi}) = “subgraphs of gl". As the right set in the previous chain is 
finite implies cs({<7i, <72}) is also finite. 
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(A 2 ) Let gi,g2 Q h. Let </>i be an isomorphim between 31 and a subgraph 
h\ = (14,, £),,, £vh r , Erh ^) of h and <f > 2 be an isomorphism between 32 
and a subgraph h 2 = (Vh 2 , Ef l 2 ,£yh 2 ,£Eh 2 ) °f h- Define h ± 2 to be another 
subgraph of h whose vertices and edges are, respectively, 14, n 14 2 and 
E hl n E h2 . The vertex and edges labels of h 12 are chosen to match the 
ones in h. With this construction of h \ 2 it can be verified that h \ 2 C 31,32 
and that 

SGVEa(h) > SGVEa(gi) + SGVEa(g 2) — SGVEa(h\ 2 ). 

To see this last inequation one should only notice that every vertex and 
edge of h that was counted twice in the sum SQVEaig 1) + SovEa{g2 ) is 
decreased once when we subtract SGVEaif 1 12)- 

This completes the proof of Theorem [ 2 j □ 

The Theorem [ 2 ] is true, if we use directed graph instead of undirected ones. 
Hence, we conclude, as a corollary of Theorem[ 2 j that (G,C,SGVE ai ), where op 
denotes the constant function equal to one, is a MCS model and hence dp is a 
metric on G (since, by Theorem [l] d a is a metric on G), with this we reobtain 
the result in [ 3 j. Furthermore, again by Theorem [lj d c and <4 are metrics on G, 
which shows versions of dp and dw based on subgraphs (note that we need to 
use as size function the sum of the number of edges and the number of vertices). 

It is worth noting that the Theorem [ 2 ] enables the use of different label 
weighting functions that makes possible to enconde application domain knowl¬ 
edge in the MCS Model definition and hence in the metrics in Theorem [I] 

4.3 Induced Subgraph MCS Model 

The second example of MCS Model on graphs is based on the induced subgraph 
relation C i (Definition [l 2 | . 

Definition 15 . (T-MCS Model,) An induced subgraph MCS Model or I-MCS 
Model is a triple 

(C, Cj, SGVoi)i 

where 

• G is the set graphs ( Definition [^1 with vertex labels in Sy, and edge labels 
in T,p ■ Furthermore, we consider two graphs 31,32 G G that are isomor¬ 
phic to be the same graph: 31 = 32. 

• Cj is the induced subgraph relation on G (Definition \ 1 £[ ); 

• and scva '■ G —► [0, +00) is a function based on a label weighting function 
a : £y —► ( 0 , +00) and, for 3 = (V, E, iy, Ee), is defined by 

[ 0, */F = 0; 

s GVa{g ) j a(ty(v)), otherwise. 

Uev 
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The following theorem shows that I-MCS Models are indeed a MCS Model. 
Theorem 3. The I-MCS Model is a MCS Model. 

Proof. It can be verified that Cj is a partial order on G. The arguments to 
show that (SI), (S 2 ) and (Al) are valid here are essentially the same as in the 
proof of Theorem [ 2 ] if we replace the terms “subgraph” by “induced subgraph” 
and sgveu by Seva- For (A 2 ) it is sufficient to notice that the construction of 
hi2 in the other proof replacing “subgraph” by “induced subgraph” yields an 
induced subgraph of g i and <72 and the same argument used there to show the 
(A 2 ) inequation was valid with h\2 can also be used here. □ 

Again, Theorem [ 3 ] also holds if we deal with directed graphs. Thus, we can 
get as corollary of Theorem[ 2 ]the fact that (G, Q, scv ai ) is a MCS Model, where 
oq denotes the constant function equal to one. Thus, ds and dw are metrics on 
G (since, by Theorem [lj d c and dd are metrics on G), with this we reobtain the 
results by [ 5 ] and [?!. Furthermore, since d a is a metric on G, we get a version of 
dp based on the induced subgraph relation (using as size function the number 
of vertices). Also, as in the previous case, the use of different label weighting 
function allows application domain knowledge to be used in the definition of the 
metrics (similarity notion). 


SubgraphMCsf 

, = 4\ 

B 1 ©)T\§ 

\ B 1 C 1 ° 

A 1 

SubgraphMCs| 

, • \ . i X 

B 1 1 

InducedSubgraphMCS f 

B' \ _ @ 

l 

© j © 

1 A’: 

''oT' a 3 - — R i \ 

InducedSubgraphMCS 
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l © 

© / 


Figure 3 : Effect of a (label weighting) on the notion of maximum common 
subelement of (a) S-MCS Model and (b) I-MCS Model. Vertex labels are letters 
superscripted with their a values. Edge labels match their a value in (a) and are 
ommited in (b), since they are not considered by scva- Note that depending 
on a, the maximum common subelement changes. 

In Figure [ 3 ] we illustrate the effect on the notion of maximum common 
subelement (subgraphs and induced subgraphs) caused by using different label 
weighting functions a. 
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4.4 Extended Subgraph MCS Model 


The previous two examples of MCS Models on graphs were based on two well 
known partial orders on graphs: subgraphs and induced subgraphs. We now 
define a third kind of partial order on graphs that we call extended subgraphs. 
The idea is a simple generalization of the subgraph partial order. Suppose that 
we fix partial orders on the vertex and edge label sets of a graph. Informally 
we say that a graph gi is an extended subgraph of a graph g 2 with respect to 
these label partial orders, if we can fit the structure of g 1 into g 2 in a way that 
each aligned vertex and edge has a label in gi that is a subelement (by the label 
partial order) of the corresponding aligned element in g 2 . This informal idea is 
defined preciselly in the following two definitions. 

Definition 16 . ("Extended Subgraph,) Let g' = (W, E’, £' v , t' E ) and g = 
(V, E,£v,@e) be graphs with vertex labels in Ey and edge labels in He- If 
is a parital order on 5 V and =4s E is a partial order on Ee, we say g' is an 
extended subgraph of g with respect to =4s v and =4s E if 

v' c v, e ' cfin[yf, 

4 x v tv(v), for v e V', 

^(e) 4 x e £e{o), for e e E’. 

When =4s v and =^s E are clear in the context we simply say that g' is an extended 
subgraph of g. 

Definition 17 . ("Extended Subgraph Isomorphic,) If g' is isomorphic to 
a graph that is an extended subgraph of g with respect to and we say 

that g is extended subgraph isomorphic to g 1 , and denote this fact by g’ C e g. 

We are now able to define the third example of MCS Model on graphs based 
on the extended subgraph relation C e (Definition [Tt] ) . 

Definition 18 . ("E-MCS Model,) Let My and Me be MCS Modelson Ey 
and Ee 


My = (Ey, = 4 z v , Se v ), 

Me = {E e ,=4 EejSse), 

with size functions being strictly positive: Ss v > 0 and Ss E > 0 . An extended 
subgraph MCS or E-MCS Model with respect to My and Me is a triple 


(G, C e , sges) 


where, 

• G is the set of graphs (Definition^ with vertex labels in Ey, and edge 
labels in Ee; Furthermore, we consider two graphs gi,g2 G G that are 
isomorphic to be the same graph: gi = g 2 . 
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• C e is the extended subgraph relation on G with respect to =4s v and =^s E 
(Definitional^; 

• and for g = (V, E,H.vAe) G G, 

i o, */y = 0; 

SGEs{g) < ^ sy, v (Z v {v)) + ^2 s x E (tE{e)), otherwise. 

\ vGV e£E 


The following theorem shows that E-MCS Models are indeed MCS Models. 

Theorem 4 . The E-MCS Model is a MCS Model. 

Proof. The proof goes as follows: 

(51) Let g\,g 2 be graphs such that gi = 4 e g 2 . By definition of = 4 e , there exists 
an extended subgraph g ' 2 of g 2 that is isomorphic to g\, and, clearly, 
SGEsigi) = SGES^g?)- Since SGEs(g 2 ) is a sum running over a subset 
of the vertices and edges of Sges{ 9 2 )> and each vertex or edge in the sum 
of SGEs(g 2) yields a smaller or equal value than the one in the sum of 
SGEs(g2 ), we can conclude that (SI) is valid. 

(5 2 ) Let g\ =^ e g 2 and s(gi) = s(g 2 ). By definition of =^ e , there exists an ex¬ 
tended subgraph g ' 2 of g 2 that is isomorphic to <71 and, clearly, SGEs(gi) = 
SGEs{g 2 )- This implies s G Es{g 2 ) = SGEste)- As g 2 is a subgraph of g 2 
the only option to make scEs{g 2 ) = SGEs{g2) is to have g 2 = g 2 . 

(Al) Again, let g x = (V x , E x , £ Vl , i E i) and g 2 = (V 2 , E 2 ,£ V2 ,£ E2 ) be two 
graphs. The fact that the empty graph is a subgraph of any graph implies 
that {empty graph} C cs({g x ,g 2 }) and, consequently, 0 < |cs({<7i, <72})I - 
In order to prove that the set {s(tj)|g =^ e gi,g 2 } has a maximum, let 
gf = (V\,Ei) and g 2 = (V 2 ,E 2 ) be the unlabelled copies (same struc¬ 
ture) of g x and g 2 respectively. The set of common subgraphs (by the 
subgraph isomorphic relation) of gf and g 2 is finite, as in the proof of 
Theorem [2] Denote this set by cs(g x ,g 2 ) = {/ii,..., h n }. For each hi, 
let = { 4 >i ,..., </>**} be the set of all subgraph isomorphisms between hi 
and g x . Similarly, let ILj = {ipi, ...,'i/A} be the set of all subgraph isomor¬ 
phisms between hi and g 2 . Now, for each s = 1 ,..., fe* and t = 1 
the map o (V’{) _1 defines a isomorphism from a subgraph gf* of g® 
and a subgraph g 2 of g 2 . Finally, denote by g st the extended sub¬ 
graph of g x and g 2 that has the vertex and edge set the same as gf* 
and the labels are defined as follows: for each vertex v and edge e of 
g st define its label as an element of mcs(£vi (v),e V 2(<l>i o (’•Pi) x (w))) and 
mcs{^E\{e),f.E 2 {<t>l 0 (V’|) _1 ( e )))) respectively. Now let so and t 0 be such 
that s G Es{gs 0 t 0 ) = max({s G Es{gst)\s = 1 ,..., fe* and t = 1 ,...,?»}). By 
construction, such g So t 0 is the extended subgraph of g x and g 2 with max¬ 
imum size. 
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(A 2 ) Let g\,g2 C e h. Let </>i be an isomorphim between g\ and an extended 
subgraph hi = (V^, Eh x , £vhi, ^Ehi ) of h and <p 2 be an extended iso¬ 
morphism between g 2 and a subgraph h 2 = (Vh 2 , Eh 2 , ivh 2 , ^eh 2 ) of h. 
Define h \ 2 to be another extended subgraph of h whose vertices and edges 
are, respectively, fl 14 2 and E hl fl E h2 . The label of vertex v of h 12 
is defined as follows: let ah, ah 1 , cpi 2 be the label of v in, respectively, 
h,hi,h 2 ', by the fact that h\,h 2 are extended subgraphs of h, we have 
that aj ll ,ah 2 = 4 s v ah', by axiom (A 2 ) in Mv there exist ah 12 £ Ey such 
that Se v K) > ss v (a hl ) + s Sv (a h2 ) ~ s Sv (a /tl2 ); define the label of v in 
hi 2 to be <Ti 2 . The label of an edge of h 12 is defined in an analogous way. 
With this construction of h \ 2 it can be verified that h \ 2 C gi,g 2 and that 

SGEs{h ) > SGEsigi) + SGEs(g2) - SGEs(h 12 ). 

The proof of Theorem [ 4 ] is complete. □ 

When modeling real world concepts using graphs, it is usually important to 
have flexibility when defining what information a vertex or an edge will carry. 
For example, in scientific workflow descriptions vertices represent parameterized 
modules that represent some kind of computation. Usually a single module 
is configured with a set of parameters and values which are not adequately 
represented by a single symbol, but, instead, by a more complicated object. 
The nesting property of E-MCS Modelsthat enables plugging other MCS Model 
elements as labels of vertices and edges, and be able to derive metrics for these 
objects that take into account all parts that form the final object is an interesting 
one. 

To illustrate E-MCS Models, we will use them, in next section, to build a 
link between Graph Edit Distance and MCS Models. Before that we need an 
additional property of E-MCS Models that states that if we restrict the elements 
(graphs) of an E-MCS Model to complete graphs of n vertices, we still have a 
MCS Model. We will refer to this MCS model as a n-restricted E-MCS Model. 

Proposition 2. Let Ai = (G,C e , Sges) be an E-MCS Model with respect to 
My = (LV, = 4 j: v , se v ) and Me = (E#, =^s E , ss B ). Let K n be the subset of G 
formed of complete graphs with n vertices. Let = 4 k„ and Sk n be the restrictions 
of C e and sges to K n . In this context, the triple Mk„ = (A'„, Sif„) is 
also a MCS Model. 

Proof. Properties (R 1 ),(R 2 ) and (R 3 ) clearly hold for since they are valid 
for C e . Similarly, the properties (SI) and (S 2 ) hold for sx n , since they hold for 
sges- 

• (Al) Let gi = (Vi,E 1 ,I Vl ,£ El ),g 2 = {V 2 ,E 2 ,i V 2 j ,I E2 ) e K n . Let (f be a 
bijection between V) and V 2 . Then, </> defines a one-to-one correspondence 
between any vertex and edge of g± and g 2 . Then, we can define a graph 
gr 2 G K n , by using the same vertex and edge sets as in g\ and defining 
the label for each vertex v £ Vi as an element of cs({£vi{v), ^V2(4 > { v ))}) 
and for each edge e £ E\ as an element of cs({£ E i{e),I E 2{ ( l , { e ))})- 
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Now let {fa, ...,fa\} be the set of all bijection between V\ and Vi- For 
each bijection fa, we can define a extended subgraph g^ 2 of g\ and g 2 as 
before, but chosing as labels for each vertex an element of mcs({£vi(v), 
£ V2 {fa{v))}) an d for each edge an element of cs({£ei (e), t-E 2 {fa(e))})- 
Let k 0 be the index associated with the largest graph among g^ 2 , i.e., 
s(9i2) = max ({ s K 7l (gi 2 )\k = 1 , •••, id.)}). By construction, s(g^) = 

max({s Krl (g)\g £ cs({gi,g 2 })})- 

• (A2) Let g\ = {V\, Ei,£vi,£ei), g 2 = (V 2 ,E 2 ,£v2^ei) £ K n and let 
also g = {V,E,£y,£ e) £ K n be such that gi,g 2 g- We want to 
define a complete graph g V2 = (V12, Fh 2 , IVi 2 , £ei 2 ) such that g\ 2 = 4 x n 
gi,g 2 g and also that s(g) > s(gi) + s(g 2 ) — s(gi 2 ). In order to do 
so, we fix one correspondences <pi 2 (bijection) between V\ 2 and Vi and 
other one fa between V\ and V 2 . We are going to denote v £ V\ 2 ,fa 2 {v) 
and fa{fa 2 {v)), just by v. We define the label of g 22 as follows: For 
each v £ V12 we know that £vi{v), tv 2 { v ) =4 ^v(u), then we can use 
the axiom (A 2 ) for the MCS Model(£y, s iv) an d conclude that 
there exists a label a 22 such that a 22 =4s v £vi{v), £v 2 {v) £v{v) 

and Se v (£v{v)) > s^ v (£ vi {v)) + s^ v {i V2 {v)) - s Ev (ai 2 ). We define 
£vi 2 (v) = ai 2 . With a similar construction, we can define the edge label 
function £e\ 2 - One can verify that <712 constructed this way satisfy the 
axiom (A 2 ). 


□ 


5 Relation between Graph Edit Distance and 
MCS Models 

In this section we show a relation between graph edit distance and MCS Models. 
Informally speaking, this connection states that if (Iged is a graph edit distance 
and a metric on G, then there is a corresponding MCS Model A 4 such that 

dGED(gi,g 2 ) = d a (d(gi),e(g 2 )), 

where 31,^2 £ G, 6 takes the elements of G into their corresponding elements 
in Ai , and d a is the first of the four metrics in Theorem [l] valid in M .. Thus, 
the MCS Model M. encodes cIged- The problem of finding the graph edit 
distance between g 2 and g 2 becomes the problem of funding a maximum common 
subelement between 0{g\) and 9{g 2 ) in M. 

Before stating the main result of this section, we define precisely what we 
mean by graph edit distance. We use the notion of graph completion in this 
definition to facilitate the exposition: for any two graphs we can refer to bijec- 
tions between the vertices of their completed versions instead of having to deal 
with functions between subsets of the vertices of the first graph into the ver¬ 
tices of the second graph. This definition of graph edit distance is equivalent to 
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the common use of the term, where the cost of each vertex and edge operation 
(i.e., addition, deletion, and substitution) is based on labels and these operation 
costs are known a priori. 

Definition 19 . ( Graph Edit Distance,) Let g\ and <72 be graphs with vertex 
labels in Ey and edge labels in Tie- Furthermore, let 

cy : (Ey U {ey}) 2 -> [0, 00 ], 

Ce '■ (Es U {be }) 2 ~> [ 0 ,00] 

be, respectively, edit cost functions on vertex and edge labels, where Ey and Ee 
are special labels. Assume that 

g'l = = (v;,E[j'y i,4i), 

92 = K |y 1 ’|+jy 2 |(l?2) = (, F2 , Fy 2, ^E2 ) • 

Let F be the set of bijections from V[ to Vf. The cost c(f) for f £ IF is defined 
as 


c(f)= ^2 c v (e! vi ( v ),e! v2 (f( v ))) + c B (tVi(e),4 2 (/(e)))- ( 15 ) 

veV{ e£E[ 


In this context, we define the graph edit distance between g\ and <72 as 

dGEo(9i,ff2) = min c(f). 

Some uses of the term graph edit distance refer to a more general idea. For 
example, Jp] shows a correspondence between the maximum number of vertices 
of a common induced subgraph and a specific graph edit distance notion where 
the edge operation cost depends on which operation was done in its end vertices. 
Now we are able to state the main result of this section. 

Theorem 5 (GED AND MCS Model). Let G n be the set of graphs with n or 
less vertices on finite label sets Ey and Tie■ Let cy : (Ey U £y ) 2 —> [ 0 ,00) and 
ce : (Te U Ee ) 2 —> [ 0 ,00) be edit cost functions. Furthermore, let cy and ce be 
metrics on Ey U {ey} and E^U {e^}. Then, there exists a MCS Model 

M n = (X,=4x, sx ) 

and an injective function 9 : G n —► X such that 

d G ED{gi,92) = s x {9( gi )) + Sx(9(g 2 )) - 2s' x ({6(g 1 ),8(g 2 )}). 

Proof. Apply Lemma [l] on the finite metric spaces (Ey U {ey},cy) and (Te U 
{ee},ce) to obtain corresponding MCS Models A iy = (Ey, = 4 ^' , ) and 

Me = (E^ ,= 4 e' e , s e' e ); where the size of the smallest element in these MCS 
Models are strictly positive. Let G' be the set of graphs with labels in T' v and 
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T,' e . Observe that the triple (G', C e , sges ) with respect to Mv and Me is an E- 
MCS Model. Define X = K 2n to be the subset of G' consisting only of complete 
graphs with 2 n vertices. By Proposition [2] we know that M n = (X, =4 x ,s x ) 
is a MCS Model, when =4 X and s x are restrictions of C e and sqes to the set 
X. Assume gi,g 2 £ G n and their vertex sets are, respectively, V\ and V 2 . By 
definition, the graph edit distance between g\ and g 2 is the minimum value 
of function c (Equation 151 for a vertex bijection between (<?i) and 

K jvi’| 1 +jv r 2 | ( 52 )- It can be checked that for our metric cy and ce this minimum 
value of function c is the same if we consider vertex bijections between n 2 n’ eE (31) 
and K 2 n’ SE (g 2 )- Define 9 : G n —» A' to be the graph completion k ^ 1 ’ eb . Make 
xi = 9(g 1 ) = {Vl,E[,e' Vi ,e' Ei ) and x 2 = 9{g 2 ) = {V 2 ,E' 2 ,i' V2 ,t' E2 ). Let / be a 
bijection between the vertices of aq and x 2 . Define Xf £ X in the following 
way: for every vertex v in X\ there corresponds a vertex in Xf labeled with 
an element (any element) of mcs({£' Vi { v )i^V 2 and for every edge e in 

Xi there corresponds an edge in x/ labeled with an element (any element) of 
mcs({£' Ei (e),£' E2 (f(e))}). Using this construction for Xf it is clear that Xf =4 X 
x\,x 2 and it can be verified that c(f) = s x (xi) + s x (x 2 ) — 2 s x (xf). Let /o be 
the bijection between vertices of X\ and x 2 that yields the graph edit distance 
between g 1 and g 2 . At this point we can write 


dcED(gi,g 2 ) = c(/ 0 ) = s x (x 1 ) + s x (x 2 ) - 2 s x (x fo ). 

To conclude the proof it remains showing that s x (xf 0 ) = s(^({a;i,X 2 }). As¬ 
sume s x (xf 0 ) < s' x ({xi,x 2 }) and Xi 2 £ mcs({xi,x 2 }). Let <f> 1 and (f> 2 be 
isomorphisms between x\ 2 and extended subgraphs of X\ and x 2 . Define j\ 2 = 
<f >2 0 fii 1 - Note that fi 2 is a bijection between vertices of X\ and of x 2 and 
that s x (x 12 ) = s x (xf 12 ). In this case, we can write s x (xf 0 ) < s' x ({x\,x 2 }) = 
sx(x 12 ) = s x (xf 12 ), for bijection fi 2 . This contradicts the hypothesis that 
s x (xf 0 ) is the maximum possible for a bijection between vertices of X\ and x 2 . 
The theorem is proven. □ 

An interesting aspect of this theorem is that it brings a different and precise 
materialization for the meaning of a metric graph edit distance between two 
graphs: we can see it encoded in an element of a corresponding MCS Model. 
We see as applications of this connection, the interpretation of natural notions 
in the MCS Model in terms of the original metric graph edit distance. For 
example, a maximum common subelement of three or more elements of the 
MCS Model could correspond to a natural generalization of the metric graph 
edit distance between three or more graphs. 


6 Conclusions 

In this paper we have introduced MCS Model which is a generalization of a 
model proposed by [3]. We then showed four metric functions to be valid in any 
MCS Model (three additional metrics to the one shown in [2]). The usefulness 
of the MCS Model is that it serves as a template to fit into applied scenarios 
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and ease the derivation of metrics (precise similarity notions) in those scenarios. 
We show this usage of MCS Models by presenting three examples on graphs: 
the S-MCS (based on subgraphs), I-MCS (based on induced subgraphs), and E- 
MCS (based on a less common partial order that we name extended subgraphs). 
With these examples we are able to reproduce and extend previous reported 
metrics on graphs mm a as well as new ones (e.g., subgraph versions of ds 
and dw)- The E-MCS Model has an interesting nesting property that allows 
one to derive distance metric for graphs with complex labels, which might be 
of important value when modeling real scenarios. A final contribution of this 
paper is an interpretation of the graph edit distance that is a metric on graphs 
as, essentially, a maximum common subelement on a corresponding MCS Model. 
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